223 research outputs found
Adaptive approximate Bayesian computation for complex models
Approximate Bayesian computation (ABC) is a family of computational
techniques in Bayesian statistics. These techniques allow to fi t a model to
data without relying on the computation of the model likelihood. They instead
require to simulate a large number of times the model to be fi tted. A number
of re finements to the original rejection-based ABC scheme have been proposed,
including the sequential improvement of posterior distributions. This technique
allows to de- crease the number of model simulations required, but it still
presents several shortcomings which are particu- larly problematic for costly
to simulate complex models. We here provide a new algorithm to perform adaptive
approximate Bayesian computation, which is shown to perform better on both a
toy example and a complex social model.Comment: 14 pages, 5 figure
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
Choosing summary statistics by least angle regression for approximate Bayesian computation
YesBayesian statistical inference relies on the posterior distribution. Depending on the model, the posterior can be more or less difficult to derive. In recent years, there has been a lot of interest in complex settings where the likelihood is analytically intractable. In such situations, approximate Bayesian computation (ABC) provides an attractive way of carrying out Bayesian inference. For obtaining reliable posterior estimates however, it is important to keep the approximation errors small in ABC. The choice of an appropriate set of summary statistics plays a crucial role in this effort. Here, we report the development of a new algorithm that is based on least angle regression for choosing summary statistics. In two population genetic examples, the performance of the new algorithm is better than a previously proposed approach that uses partial least squares.Higher Education Commission (HEC), College Deanship of Scientific Research, King Saud University, Riyadh Saudi Arabia - research group project RGP-VPP-280
Methods for detecting associations between phenotype and aggregations of rare variants
Although genome-wide association studies have uncovered variants associated with more than 150 traits, the percentage of phenotypic variation explained by these associations remains small. This has led to the search for the dark matter that explains this missing genetic component of heritability. One potential explanation for dark matter is rare variants, and several statistics have been devised to detect associations resulting from aggregations of rare variants in relatively short regions of interest, such as candidate genes. In this paper we investigate the feasibility of extending this approach in an agnostic way, in which we consider all variants within a much broader region of interest, such as an entire chromosome or even the entire exome. Our method searches for subsets of variant sites using either Markov chain Monte Carlo or genetic algorithms. The analysis was performed with knowledge of the Genetic Analysis Workshop 17 answers
Simulation-based model selection for dynamical systems in systems and population biology
Computer simulations have become an important tool across the biomedical
sciences and beyond. For many important problems several different models or
hypotheses exist and choosing which one best describes reality or observed data
is not straightforward. We therefore require suitable statistical tools that
allow us to choose rationally between different mechanistic models of e.g.
signal transduction or gene regulation networks. This is particularly
challenging in systems biology where only a small number of molecular species
can be assayed at any given time and all measurements are subject to
measurement uncertainty. Here we develop such a model selection framework based
on approximate Bayesian computation and employing sequential Monte Carlo
sampling. We show that our approach can be applied across a wide range of
biological scenarios, and we illustrate its use on real data describing
influenza dynamics and the JAK-STAT signalling pathway. Bayesian model
selection strikes a balance between the complexity of the simulation models and
their ability to describe observed data. The present approach enables us to
employ the whole formal apparatus to any system that can be (efficiently)
simulated, even when exact likelihoods are computationally intractable.Comment: This article is in press in Bioinformatics, 2009. Advance Access is
available on Bioinformatics webpag
Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks
Undirected graphical models are widely used in statistics, physics and
machine vision. However Bayesian parameter estimation for undirected models is
extremely challenging, since evaluation of the posterior typically involves the
calculation of an intractable normalising constant. This problem has received
much attention, but very little of this has focussed on the important practical
case where the data consists of noisy or incomplete observations of the
underlying hidden structure. This paper specifically addresses this problem,
comparing two alternative methodologies. In the first of these approaches
particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently
explore the parameter space, combined with the exchange algorithm (Murray et
al., 2006) for avoiding the calculation of the intractable normalising constant
(a proof showing that this combination targets the correct distribution in
found in a supplementary appendix online). This approach is compared with
approximate Bayesian computation (Pritchard et al., 1999). Applications to
estimating the parameters of Ising models and exponential random graphs from
noisy data are presented. Each algorithm used in the paper targets an
approximation to the true posterior due to the use of MCMC to simulate from the
latent graphical model, in lieu of being able to do this exactly in general.
The supplementary appendix also describes the nature of the resulting
approximation.Comment: 26 pages, 2 figures, accepted in Journal of Computational and
Graphical Statistics (http://www.amstat.org/publications/jcgs.cfm
MSMC and MSMC2: the multiple sequentially markovian coalescent
The Multiple Sequentially Markovian Coalescent (MSMC) is a population genetic method and software for inferring demographic history and population structure through time from genome sequences. Here we describe the main program MSMC and its successor MSMC2. We go through all the necessary steps of processing genomic data from BAM files all the way to generating plots of inferred population size and separation histories. Some background on the methodology itself is provided, as well as bash scripts and python source code to run the necessary programs. The reader is also referred to community resources such as a mailing list and github repositories for further advice
ABCtoolbox: a versatile toolkit for approximate Bayesian computations
BACKGROUND: The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations. RESULTS: Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females. CONCLUSION: ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results
Using DNA Methylation Patterns to Infer Tumor Ancestry
Background: Exactly how human tumors grow is uncertain because serial observations are impractical. One approach to reconstruct the histories of individual human cancers is to analyze the current genomic variation between its cells. The greater the variations, on average, the greater the time since the last clonal evolution cycle (‘‘a molecular clock hypothesis’’). Here we analyze passenger DNA methylation patterns from opposite sides of 12 primary human colorectal cancers (CRCs) to evaluate whether the variation (pairwise distances between epialleles) is consistent with a single clonal expansion after transformation. Methodology/Principal Findings: Data from 12 primary CRCs are compared to epigenomic data simulated under a single clonal expansion for a variety of possible growth scenarios. We find that for many different growth rates, a single clonal expansion can explain the population variation in 11 out of 12 CRCs. In eight CRCs, the cells from different glands are all equally distantly related, and cells sampled from the same tumor half appear no more closely related than cells sampled from opposite tumor halves. In these tumors, growth appears consistent with a single ‘‘symmetric’ ’ clonal expansion. In three CRCs, the variation in epigenetic distances was different between sides, but this asymmetry could be explained by a single clonal expansion with one region of a tumor having undergone more cell division than the other. The variation in one CRC was complex and inconsistent with a simple single clonal expansion
A Simulated Annealing Approach to Approximate Bayes Computations
Approximate Bayes Computations (ABC) are used for parameter inference when
the likelihood function of the model is expensive to evaluate but relatively
cheap to sample from. In particle ABC, an ensemble of particles in the product
space of model outputs and parameters is propagated in such a way that its
output marginal approaches a delta function at the data and its parameter
marginal approaches the posterior distribution. Inspired by Simulated
Annealing, we present a new class of particle algorithms for ABC, based on a
sequence of Metropolis kernels, associated with a decreasing sequence of
tolerances w.r.t. the data. Unlike other algorithms, our class of algorithms is
not based on importance sampling. Hence, it does not suffer from a loss of
effective sample size due to re-sampling. We prove convergence under a
condition on the speed at which the tolerance is decreased. Furthermore, we
present a scheme that adapts the tolerance and the jump distribution in
parameter space according to some mean-fields of the ensemble, which preserves
the statistical independence of the particles, in the limit of infinite sample
size. This adaptive scheme aims at converging as close as possible to the
correct result with as few system updates as possible via minimizing the
entropy production in the system. The performance of this new class of
algorithms is compared against two other recent algorithms on two toy examples.Comment: 20 pages, 2 figure
- …